I was recently playing with Apple's CoreML and had several painful observations on tooling. It's not enough for a long read but should be for an HN post.
In short, you can take a simple BERT-like encoder model in PyTorch, convert it into an f32 CoreML checkpoint, and run it on CPU or GPU, but not NPU. Let's unpack this.
Having a simple and extensible format to exchange common ANN architectures is a big issue for anyone who uses more than one framework or programming language to run the same model. ONNX is the closest we have to that standard, but it's hard to call anything Protobuf-related "simple." If you start with ONNX, you quickly realize - Apple has no tool for converting ONNX->CoreML. But if you want to go ONNX->PyTorch->CoreML, be warned that PyTorch has no ONNX import functionality. (issue #21683).
When you convert to CoreML, you can choose the `precision`. Evaluating modern models in full precision seems wasteful, so I've used half-precision variants over single precision. Unlike ONNX, with CoreML, you can't get to data types under 16 bits in size. 16-bit variants also didn't work for me, so UForm is currently stuck with 32 bits. This means our iOS-targeting checkpoints are heavier than PyTorch, SafeTensor, and ONNX exports of the same model (bf16, bf16, and u8, respectively).
https://huggingface.co/unum-cloud/uform3-image-text-english-...
CoreML tooling allows you to specify the `RangeDim`, marking the variable-length sides of the input tensor. This is handy if you want to support different batch sizes. ONNX has that functionality and works fine, while CoreML fails. So, for now, I stick to batch size one.
Last, Xcode provides a profiler to measure your models' latency/throughput. The profiler covers CPUs, GPUs, and NPUs, but no model I've tested could run on the NPU. I assume those are reserved for first-party models. Interestingly, Apple Silicon contains specialized AMX (Advanced Matrix eXtensions) clusters near Macs' performance and efficiency cores. Those differ from Intel's AMX (Advanced Matrix eXtensions) and Arm's SME (Scalable Matrix Extensions). They aren't publicly documented, but their whole purpose is AI acceleration. If I run the inference on the CPU - it's 10x slower than on the GPU on M2 Pro, so AMX is probably not used. It would be great to get clarifications from Apple on the purpose of all those specialized enclaves.
---
Model aside, there are a lot of other issues with the developer experience.
Let's address the Xcode in the room. I mostly write code using VS Code. When it gets too slow and buggy, I switch to the native Sublime Text. In the Apple ecosystem, you are lost without a good second option once Xcode fails.
The most common issues I've faced were adding/updating/removing app dependencies. Another big one is running a build and wondering if it's the latest or some internal cached version. When something breaks, you must navigate Plist (XML) files to clean up the mess. That's similar to manually editing the `yarn.lock` or `poetry.lock` if you are coming from JS or Python.
One of VS Code's handiest features is "Format on Save." Keeping the code sane is necessary for popular open-source projects. Xcode has no such feature. Apple has a first-party tool called `swift-format`, which uses the `.swift-format` config. Xcode doesn't respect that config. Moreover, I couldn't make the `swift-format` tool mimic the native Xcode style for empty lines.
Last, I needed help finding a way to use Sphinx for Swift and Objective-C documentation. Generating API references for projects with many language bindings is extremely hard, and Swift isn't making it more accessible.
---
Overall, Apple ships some fantastic hardware, but we are in the very early days of software adoption, and I hope these notes help the company patch some rough corners before the WWDC.