Hacker News new | past | comments | ask | show | jobs | submit login

As somebody who works along with Applied Scientist helping them with tasks related to model training and deployemnt; how does one get exposure to more lower level engineering work like optimization, performance etc. We have an ML infra team; but their goal is building tools around the platform, not necessarily getting workloads run optimially



I think no optimization is possible withoutprofiling. I think getting yourself familiar with the tools to understand the performance of a model might be the 1st step, e.g., https://pytorch.org/tutorials/recipes/recipes/profiler_recip...


Yes - understand first, then fix. And you’ll understand by measuring/profiling things.

I’d also recommend the detailed pytorch optimization case studies by Paul Bridger:

https://paulbridger.com/


Brendan Gregg's work on system performance and profiling is a good place to start. A lot of ML perf boils down to Linux perf or what the heck is happening in an HPC scheduling system like SLURM. https://www.brendangregg.com/linuxperf.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: