Hacker News new | past | comments | ask | show | jobs | submit login

This is refreshing. I’ve been learning machine learning through Kaggle. Recently and I’m a bit tired with the “tuning hyperparameter” culture. It rewards people that have the pockets to spend on computing power and the time to try every parameter. I’m starting to find problems that don’t have a simple accuracy metric more interesting. It forces me to understand the problem and think in new ways, instead of going down a checklist of optimizations.



I'm also starting to follow people and communities that work with deep learning in new ways. Here are some of my favorites:

[1] http://colah.github.io/

[2] https://iamtrask.github.io/

[3] https://distill.pub

[4] https://experiments.withgoogle.com/ai


You can be a little less brute force if you use something like hyperopt (http://hyperopt.github.io/hyperopt/) or hyperband (https://github.com/zygmuntz/hyperband) for tuning hyperparameters (Bayesian and multi-armed bandit optimization, respectively). If you're more comfortable with R, then caret supports some of these types of techniques as well, and mlr has a model-based optimization (https://github.com/mlr-org/mlrMBO) package as well.

These types of techniques should let you explore the hyperparameter space much more quickly (and cheaply!), but I agree - having money to burn on EC2 (or access to powerful GPUs) will still be a major factor in tuning models.


Ha, it reminds me of what Andrej Karpathy‏ said "Kaggle competitions need some kind of complexity/compute penalty. I imagine I must be at least the millionth person who has said this." It would be interesting to collaborate/compete on more creative tasks and have different metrics for success.

[1] https://twitter.com/karpathy/status/913619934575390720


So true. Another reason to put constraints in Kaggle competition is due to production environment. How many winner models have been used in production? I suspect this number is near zero. High accuracy with a delayed time makes a ML/DL artefact not usable in production, because from users point of view speed is much more valuable than the difference between 97% and 98% in accuracy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: