Hacker News new | past | comments | ask | show | jobs | submit login

There is nothing wrong with ONNX, but rather the limitations of PyTorch.

1. PyTorch model files are neither portable nor self-contained, because PyTorch model files are pickled Python classes containing weights so you need Python class code to run it.

Because it needs real Python code to run the models, PyTorch suffers from a numerous issues porting to other non-Python platforms such as ONNX.

PyTorch offers a way to export to ONNX but you will encounter various errors. [1]

Sure, you might be lucky enough to troubleshoot a specific model to export it to ONNX, but if your objective is to export arbitrary 964 models from a model zoo (TIMM) it is almost impossible.

2. There are organizational or cultural problems with it. Because of the above problem, PyTorch model needs to be designed with portability in mind from beginning. But porting & serving models are what engineers do, whereas researchers, who design models, don't care about it when writing papers. So it is often hard to use SOTA models that comes from an academic research.

[1] https://pytorch.org/docs/stable/onnx.html#limitations




> PyTorch offers a way to export to ONNX but you will encounter various errors. [1]

I mean sure, there are limitations, but this is greatly exaggerating their impact in my experience. I'd be curious to hear from anyone where these have been serious blockers, I've been exporting PyTorch models to ONNX (for CV applications) for the last couple of years without any major issues (and any issues that did pop up were resolved in a matter of hours).


So I tried converting an ASR model[0] to ONNX about a year or two back. It was really painful. The pain could largely be ascribed to:

(1) code that is very dynamic, making it hard for Pytorch to convert the modules to TorchScript (which it does before converting them to ONNX)

(2) ops that were simply not available in ONNX. Especially, torch.fft, also some others.

[0] https://github.com/burchim/EfficientConformer


How did this happen? a pickle is not a sensible storage format. it's insecure, hard to version, not very portable. isnt a model basically a big matrix of numbers?


Not in PyTorch. A model is Python dictionaries containing states and Python module/class objects. I don't know why the PyTorch team did this but that happened. Maybe it boils down to the point #2 I said.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: