Hacker News new | past | comments | ask | show | jobs | submit login

I assume that multiplying by a given factor shouldn't matter since you still have the learning rate as a factor (which is itself a factor of the gradient). This might just mean that the learning rate should be lower or higher with this method.



The question is then really about which method makes it easier to tune parameters or which helps intuition the most.


this is a good way to think about this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: