Hacker News new | past | comments | ask | show | jobs | submit login

I like your description because it's relatively succinct and intuitively suggests why the modified softmax can help the model handle edge cases. It's nice to ask: How could the model realistically learn to correctly handle situation X?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: