Hacker News new | past | comments | ask | show | jobs | submit login

Really great end-to-end write up.

Just a few things: in general case it's better not to use MSE after sigmoid due to slow convergence.

And "logits" variable is not logits actually, it's probabilities. Logits is what you have before applying sigmoid activation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
