Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft Phi-3 Cookbook (github.com/microsoft)
152 points by nonfamous 44 days ago | hide | past | favorite | 57 comments



Looks like some of the docs are generated by an llm. I see pictures with typos and imagined terms, incomplete texts etc., I wonder to what extent we can trust rest of the docs.

https://github.com/microsoft/Phi-3CookBook/blob/main/md/04.F...


https://github.com/microsoft/Phi-3CookBook/commit/ba688b9a35...

Scroll down to the end and the removed text is totally suspect. I wouldn't be to surprised if all of this was generated by an LLM then anything strange was edited by a human. Another reason not to leave everything to the LLM.


Gool for specific tucks!


Good for specific tasks?


You can interact with the new Phi-3 vision model on this page (no login required): https://ai.azure.com/explore/models/Phi-3-vision-128k-instru...


I get a "the request was blocked" error.


Works fine here on Firefox on macOS and not logged in.


"We are introducing Phi Silica which is built from the Phi series of models and is designed specifically for the NPUs in Copilot+ PCs. Windows is the first platform to have a state-of-the-art small language model (SLM) custom built for the NPU and shipping inbox. Phi Silica API along with OCR, Studio Effects, Live Captions, Recall User Activity APIs will be available in Windows Copilot Library in June. More APIs like Vector Embedding, RAG API, Text Summarization will be coming later."

2024: the year of personal computers with neural processing units running small language models

How do NPU's work? Who builds them and how are they built? Are they capable of running a variety of SLM-like firmware?


For the price of these new laptops one can already buy a MacBook with a general purpose GPU that is more than capable of running these models. One can buy a windows laptop with a dedicated graphics card that can also run these models. Perhaps it would be interesting if the price was substantially lower.


These new laptops are half the price of macbooks with the same ram and storage


In which configs and which geographic locations? In the US, pricing for equivalent Surface and Mac hardware configurations looks the same, to me.

And that's if the Snapdragon X Elite is actually on par with the Apple M3, like Microsoft claims.

Earlier X Elite benchmarks[1][2] showed that it was behind the M3, but hopefully Qualcomm have made some solid performance changes since then. Competition is good.

1. https://www.xda-developers.com/snapdragon-x-elite-benchmarks... Note: the link is comparing vs the M2

2. https://www.tomshardware.com/pc-components/cpus/early-snapdr...


US. Canada. Probably other regions. A Surface Laptop with 16GB RAM and 512GB SSD is $1200 USD. A M3 MBA with with 16GB RAM and 512GB SSD is $1500.


An equivalent-specced Surface laptop would be the following specs:

- 13" or 15" screen

- Snapdragon X Elite (which still doesn't match M3 performance)

- 16 GB memory

- 512 GB SSD

Surface laptop price is $1399 (13") or $1499 (15"), which is $100 cheaper for the 13" vs the equivalent MBA, and $200 cheaper for the equivalent MBA 15". The Snapdragon X Plus, which is what it looks like you priced out, is quite inferior to the Elite and even the older Apple M1/M2 series chips.

This is all to say that neither is a bad buy, depending on your needs.


Also worth noting that the Macbook Air is still not upgraded to a MicroLED display (uses IPS instead), whereas the Surface Laptop now comes standard with OLED.


no it doesn't, at least not according to the specs https://www.microsoft.com/en-us/surface/devices/surface-lapt...

The Surface Pro has an OLED config now, which costs more, but the laptop and standard pro are still LCD.


I just bought an M3, my first MacBook in a while. I was appalled by the screen brightness and contrast. It’s really lacking.


Only the tablet has oled option. And that is an 500$ extra.


Are we talking about AI/ML?

X Plus and X Elite NPU hit 45 TOPS. M4 is 38 TOPS...

Also, as far I know, Elite does match M3. Actually, Qualcomm promise that even X Plus is able to match it, as they say Elite is better than M3.

We'll have to wait to see more benchmarks.

PS: There's 3 Elite series.


  > Are we talking about AI/ML?
No, I was speaking to the usual single-core and multi-core benchmarks. Though you should buy based on what performance metrics you value most.

While I have a basic understanding of TOPS metrics I don’t have a good enough understanding to speak much about it — especially when I’m not sure of what exact AI/ML workloads will be used on such platforms. I mean, how many tokens/sec and what wattage does that equate to?

  > Also, as far I know, Elite does match M3. 
I would disagree from what I’ve seen.

  > Actually, Qualcomm promise that even X Plus is able to match it, as they say Elite is better than M3.
The real world benchmarks will tell. I am super curious about the performance-per-watt which is something that really matters to me (heat, battery life, etc).

Can the Elite outperform Apple’s chips? Can it do it at comparable wattage, or is it going to burn your lap doing so? Can’t wait to see.


In single core performance...

In multi-core from what I've seen X Plus is faster than base M3.


I couldn't find the Geekbench results for an MBA M3, but here's for an MBP 14" M3:

Single: 3088 Multi: 11595

Non-production Elite X From Anandtech[1]:

Single: 2800 Multi: 14400

Non-production Plus X From Anandtech[1]:

Single: 2425 Multi: 13100

Slower single perf for both Snapdragons. Decent 10% jump in multi over the M3. I am eager to see how the production units will pan out on benchmarks and sustained performance.

Qualcomm's been shady in the past with their so-called "Apple killer" chip benchmarks. I don't think these are "Apple killers", but I hope it pans out, for competition's sake.

1. https://www.anandtech.com/show/21364/qualcomm-intros-snapdra...


M3 was last year though, M4 has just been released, though iPad only for now


And here's some real world benchmarks:

M4 iPad Pro: https://browser.geekbench.com/v6/cpu/6036233 S: ~3747 M: ~14740

Snapdragon X Elite: https://browser.geekbench.com/search?utf8=%E2%9C%93&q=snapdr... S: ~2400 M: ~14000

Note this is an M4 iPad Pro. I would imagine an M4 Mac would spec a little better, due to less heat/power constraints.

tl;dr: M4 stomps on the Elite in single-core with about 36% more performance. The baseline M4 and X Elite are about even on multi-core. X Elite has marginally better NPU TOPS performance if that's something that matters to you.


ASUS Vivobook S15 S5507QA: 32GB/1TB 1400 EUR

MacBook Air 15,3" Apple M3 8-Core CPU & 10-Core GPU 24GB/1TB 2649 €


1. Ah I had assumed you were talking about the new Surface laptops, my bad.

2. Where are you seeing the pricing for those Vivobook configs? I'm not able to pull up equivalent pricing in the US, so far.

That might be a pretty good deal, depending on the real world performance of the X Elite and the build quality of the ASUS laptop (I've had quite a few and it's been some great hits and some major misses.)


The euro prices were from a leak.

But the us price for 15" vivobook 16gb/1tb with elite (from announcement video) is 1299.

15" m3 with 16gb/1tb from apple.com is 1899


Asus, the brand that can't make an ISO keyboard in Europe.

Either Surface line or Lenovo x line can be compared with Apple devices, anything else falls short.


But not same performance. One of the reason QC not releasing anything but controlled benchmarks is most likely the subpar performance of Windows on ARM. This will be the biggest hurdle for QC Elite chips, competing with M-seried which is design in tandem with MacOS.


Isn't the unified memory architecture one of the advantages of the Apple M* notebooks for running AI workloads?


Speculation.



In Australia they are the same price ($1.9k AUD vs $1.6k for MacBook air). How much where you're at?


M3 15" air with 24gb ram and 1tb ssd is 3400 australian dollars


Does the cheapest Microsoft one come with 24 gigs ram and one TB SSD?


The NPU runs this Silica model at 1.5 watts. MacBooks cannot even drive multiple monitors in this price range.


The MacBooks have an NPU too. Just nobody has done anything with them.


The MacBook NPU is 3x slower than the 45 TOPS threshold required for Copilot+ PC branding.


The initial model release had a terrible, frequent, issue with emitting the wrong "end of message" token, or never emitting one.[1] That is a very serious issue that breaks chat.

The ones from today still have this issue.[2]

Beyond that, they've been pushing new ONNX features enabling LLMs via Phi for about a month now. The ONNX runtime that supports it still isn't out, much less the downstream integration of it into the iOS/Android runtimes. Heck, the Python package for it isn't supported anywhere but Windows.

It's absolutely wild to me that MS is pulling this stuff with ~0 discussion or reputation repercussions.

I'm a huge ONNX fan and bet a lot on it, it works great. It was clear to me about 4 months ago that Wintel's "AI PC" buildup meant "ONNX x newer Phi"

It is very frustrating to see an extremely late rush, propped up by potemkin blog posts that I have to waste time to find out are just straight up lying. Burnt a lot of goodwill that they worked hard to earn.

I am virtually certain that the new Windows AI features previewed about yesterday are going to land horribly if they actually try to land them this year.

[1] https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf... [2] https://x.com/jpohhhh/status/1793003272187351195


in the screenshot you shared in the twitter link [2], the model appears to do everything correctly - it terminated its message with <|end|> which is correct according to the Phi-3 prompt format. It seems whatever environment you are hosting it in does not understand that <|end|> should be considered the EOS string, and not <|endoftext|> ??


Good call, it's an HF space for Phi Vision, maybe someone jumped the gun / didn't set stuff up properly, or it's splitting <|end|> into multiple tokens


FYI, there's a recipe for running Phi-3 under ONNX on iOS in the linked repository https://github.com/microsoft/Phi-3CookBook/blob/main/md/03.I...


Yes, that's the Potemkin village that broke this camel's back. It was linked in an announcement blog post yesterday.

You have to build two in-development libraries, one from ToT, one of which is a dev branch to make it compile for iOS on a Mac temporarily.

The dev branch doesn't actually exist.

If you use the only branch by the author on the repo, it doesn't work.

The dev branch that doesn't work is a few commits on top of ToT from 2 months ago.

At the end of that non-existent road is a model that can't end messages properly, in MyThing.app that uses llama.cpp, or LM Studio, or Ollama, or MS cloud API.

I can't ship on that, and neither can anyone else.


It looks like the Phi-3 Vision model isn't available in GGUF or ONNX. I was hoping there was a GGUF I could use with llamafile.


The bigger news is that Phi-3-Small, Phi-3-Medium and Phi-3-Vision were finally released


"finally"!? Vision wasn't even on the table until today! And they're clearly rushed and fundamentally broken for chat use cases.


I installed Phi:medium last night on my Mac using Ollama and, subjectively, it looks good. I was surprised of the claim the it was better than mistral-8x7B.

I largely ignore benchmarks now, but on the other hand, while trying many models myself is easy for simple tests, really using a LLM for an application is a lot of work.


Slightly off topic: what’s the reasonably smallest LLM model i can use to do language processing and rewriting of a large library of word documents? For the purposes of querying information and regurgitating out summaries or detailed information?

My use case is very simple: take 1000 word documents filled with two to three pages of information and pictures. And then output a set of requested information via prompting. Is there something off the shelf? Or do I have to make this?


Sounds like a good RAG use-case unless all 1k documents need to be comprehended simultaneously.

Look at H2O.ai: https://github.com/h2oai/h2ogpt


Wow, actually this cookbook is really bad? I expected something like the OpenAI or Anthropic cookbooks, but this seems to be some AI generated low-quality content without any code examples or interesting examples?

The Phi-3 models are great though, especially the vision model has great potential for low latency applications (like robotics?)...


Yeah, this is an INSTALL.md masquerading as a cookbook.


https://huggingface.co/collections/microsoft/phi-3-6626e15e9..., all of these models except Phi-3 mini are new.


Looking at the benchmarks, it seems like Phi-3 Small (7B) marginally beats out Llama3-8B on most tasks, which is pretty exciting!


On my benchmark (NYT Connections), Phi-3 Small performs well (8.4) but Llama 3 8B Instruct is still better (12.3). Phi-3 Medium 4k is disappointing and often fails to properly follow the output format.


Have you found either model to be good enough to do anything interesting, reliably?


Llama3-8B is adequate for non-technical summarization or simple categorization.


It also seems to be comparable to gpt 3.5 turbo which I feel hard to believe. People have obviously found out a way around these benchmarks.


Was playing around with this model - why does it return 2 or 3 responses when I ask it for one? I asked it for a json response and it generates 2 or 3 at a time. What's with this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: